class Penguin:
# flipper length in mm, body mass in g
def __init__(self, species, flipper_length, body_mass):
self.species = species
self.flipper_length = flipper_length
self.body_mass = body_massScientific Python antipatterns advent calendar day six
For today, a slightly more complicated example that looks at program design. As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.
If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:
and I’ll send a single email at the end with links to them all.
Classes without methods
Python is sometimes called a multiparadigm programming language, which means that we can write programs in different styles, including an object-oriented style. Objects are a convenient way to package up data and behaviour for larger programs, but are often misued as pure data structures.
Imagine we want to store some data from my favourite example dataset, the Palmer penguins. For each penguin we need to store a species name, a flipper length, and a body mass. This feels like an object problem, so we might be tempted to write a class definition:
and then create some objects:
p1 = Penguin('Adelie', 180, 3750)
p2 = Penguin('Adelie', 160, 4000)
p3 = Penguin('Chinstrap', 198, 3200)and then do some data processing:
# select all penguins heavier than 3.5 kg
heavy_penguins_flipper_lengths = []
for penguin in [p1,p2,p3]:
if penguin.body_mass > 3500:
heavy_penguins_flipper_lengths.append(penguin.flipper_length)
heavy_penguins_flipper_lengths[180, 160]
This design works, but doesn’t really take advantage of the power of classes. In our class definition, we have data, but no behaviour, so we are incurring the extra computational overhead and code complexity of a class definition for no real benefit.
There are several better options. One is to simply use a list of tuples to store the data:
penguins = [
('Adelie', 180, 3750),
('Adelie', 160, 4000),
('Chinstrap', 198, 3200)
]and use tuple unpacking when processing them:
heavy_penguins_flipper_lengths = []
for penguin in penguins:
species, flipper_length, body_mass = penguin
if body_mass > 3500:
heavy_penguins_flipper_lengths.append(flipper_length)
heavy_penguins_flipper_lengths[180, 160]
Having to explicitly unpack the tuple into individual variables like this:
species, flipper_length, body_mass = penguinevery time we want to use them is annoying, so an even better option would be to make a named tuple:
from collections import namedtuple
# flipper length in mm, body mass in g
Penguin = namedtuple("Penguin", ["species", "flipper_length", "body_mass"])This allows us to construct our penguin objects just like before:
penguins = [
Penguin('Adelie', 180, 3750),
Penguin('Adelie', 160, 4000),
Penguin('Chinstrap', 198, 3200)
]and use the attribute names without unpacking:
heavy_penguins_flipper_lengths = []
for penguin in penguins:
if penguin.body_mass > 3500:
heavy_penguins_flipper_lengths.append(penguin.flipper_length)
heavy_penguins_flipper_lengths[180, 160]
but without having to write a full class definition, or incur the computational overhead of custom classes - the data are still stored internally very efficiently as a tuple.
One downside of the named tuple approach is that if we later decide that we want to add some methods, we have to find the named tuple defintion and replace it with a class definition. An alternative to the named tuple would be a dataclass:
from dataclasses import dataclass
@dataclass
class Penguin:
species: str
flipper_length: int # in mm
body_mass: int # in gThis gives us an alternative way of defining classes that will be used mostly for storing attributes, and we can still use our nice syntax as before:
penguins = [
Penguin('Adelie', 180, 3750),
Penguin('Adelie', 160, 4000),
Penguin('Chinstrap', 198, 3200)
]
heavy_penguins_flipper_lengths = []
for penguin in penguins:
if penguin.body_mass > 3500:
heavy_penguins_flipper_lengths.append(penguin.flipper_length)
heavy_penguins_flipper_lengths[180, 160]
but keeps the option of adding methods to the class later on if we want to:
@dataclass
class Penguin:
species: str
flipper_length: int # in mm
body_mass: int # in g
def body_mass_kg(self):
return self.body_mass / 1000.0
p = Penguin(species="Adelie", flipper_length=181, body_mass=3750)
p.body_mass_kg()3.75
One final option: if we are working in an environment where we can install packages, we could use an existing library to handle the data storage. If we represented our penguins as rows in a pandas dataframe:
import pandas as pd
penguins = pd.DataFrame(
[
("Adelie", 180, 3750),
("Adelie", 160, 4000),
("Chinstrap", 198, 3200),
],
columns=["species", "flipper_length", "body_mass"],
)
penguins| species | flipper_length | body_mass | |
|---|---|---|---|
| 0 | Adelie | 180 | 3750 |
| 1 | Adelie | 160 | 4000 |
| 2 | Chinstrap | 198 | 3200 |
the we would have access to all the usual pandas tools for filtering, etc.:
penguins[penguins['body_mass'] > 3500]['flipper_length']0 180
1 160
Name: flipper_length, dtype: int64
One more time; if you want to see the rest of these little write-ups, sign up for the mailing list: